Data Mining in Astronomical Databases
نویسنده
چکیده
A Virtual Observatory (VO) will enable transparent and efficient access, search, retrieval, and visualization of data across multiple data repositories, which are generally heterogeneous and distributed. Aspects of data mining that apply to a variety of science user scenarios with a VO are reviewed. 1 Science Requirements for Data Mining What is data mining and why is applicable to scientific research? Data mining is defined as an information extraction activity whose goal is to discover hidden facts contained in databases. Data mining has taken the business community by storm and there is consequently now a vast array of resources and research techniques available for exploitation by the scientific communities. It is useful therefore to examine a categorization of data mining thrusts and their sub-components, since these are likewise applicable to the scientific exploration of large astronomical databases. Data mining is used to find patterns and relationships in data by using sophisticated techniques to build models – abstract representations of reality. A good model is a useful guide to understanding that reality and to making decisions. There are two main types of data mining models: descriptive and predictive. Descriptive models describe patterns in data and are generally used to create meaningful subgroups or clusters. Predictive models are used to forecast explicit values, based upon patterns determined from known results. There is another differentiation of data mining into two categories that we find particularly appropriate to knowledge discovery in large astronomical databases: event-based mining and relationship-based mining. At the risk of trivializing some fairly sophisticated techniques, we classify event-based mining scenarios into four orthogonal categories: • Known events / known algorithms – use existing physical models (descriptive models) to locate known phenomena of interest either spatially or temporally within a large database. • Known events / unknown algorithms – use pattern recognition and clustering properties of data to discover new observational (in our case, astrophysical) relationships among known phenomena. • Unknown events / known algorithms – use expected physical relationships (predictive models) among observational parameters of astrophysical phenomena to predict the presence of previously unseen events within a large complex database.
منابع مشابه
Science User Scenarios for a Virtual Observatory Design Reference Mission: Science Requirements for Data Mining
The knowledge discovery potential of the new large astronomical databases is vast. When these are used in conjunction with the rich legacy data archives, the opportunities for scientific discovery multiply rapidly. A Virtual Observatory (VO) framework will enable transparent and efficient access, search, retrieval, and visualization of data across multiple data repositories, which are generally...
متن کاملData Mining in Astronomical Databases
A Virtual Observatory (VO) will enable transparent and efficient access, search, retrieval, and visualization of data across multiple data repositories, which are generally heterogeneous and distributed. Aspects of data mining that apply to a variety of science user scenarios with a VO are reviewed. 1 Science Requirements for Data Mining What is data mining and why is applicable to scientific r...
متن کاملArtificial intelligence tools for data mining in large astronomical databases
The federation of heterogeneous large astronomical databases foreseen in the framework of the AVO and NVO projects will pose unprecedented data mining and visualization problems which may find a rather natural and user friendly answer in artificial intelligence (A.I.) tools based on neural networks, fuzzy-C sets or genetic algorithms. We shortly describe some tools implemented by the AstroNeura...
متن کاملDistributed Information Search and Retrieval for Astronomical Resource Discovery and Data Mining
Information search and retrieval has become by nature a distributed task. We look at tools and techniques which are of importance in this area. Current technological evolution can be summarized as the growing stability and cohesiveness of distributed architectures of searchable objects. The objects themselves are more often than not multimedia, including published articles or grey literature re...
متن کاملAutomated Clustering Algorithms for Classification of Astronomical Objects
Data mining is an important and challenging problem for the efficient analysis of large astronomical databases and will become even more important with the development of the Global Virtual Observatory. In this study, learning vector quantization (LVQ), single-layer perceptron (SLP) and support vector machines (SVM) were put forward for multi-wavelength data classification. A feature selection ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره astro-ph/0010583 شماره
صفحات -
تاریخ انتشار 2000